label scarcity
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Bermuda (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (0.67)
- Information Technology > Networks (0.42)
Understanding When Graph Convolutional Networks Help: A Diagnostic Study on Label Scarcity and Structural Properties
Subedi, Nischal, Kerstetter, Ember, Li, Winnie, Murphy, Silo
Graph Convolutional Networks (GCNs) have become a standard approach for semi-supervised node classification, yet practitioners lack clear guidance on when GCNs provide meaningful improvements over simpler baselines. We present a diagnostic study using the Amazon Computers co-purchase data to understand when and why GCNs help. Through systematic experiments with simulated label scarcity, feature ablation, and per-class analysis, we find that GCN performance depends critically on the interaction between graph homophily and feature quality. GCNs provide the largest gains under extreme label scarcity, where they leverage neighborhood structure to compensate for limited supervision. Surprisingly, GCNs can match their original performance even when node features are replaced with random noise, suggesting that structure alone carries sufficient signal on highly homophilous graphs. However, GCNs hurt performance when homophily is low and features are already strong, as noisy neighbors corrupt good predictions. Our quadrant analysis reveals that GCNs help in three of four conditions and only hurt when low homophily meets strong features. These findings offer practical guidance for practitioners deciding whether to adopt graph-based methods.
Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories
Sabir, Muhammad Ayub, Pang, Junbiao, Wu, Jiaqi, Ashraf, Fatima
Abnormal stop detection (ASD) in intercity coach transportation is critical for ensuring passenger safety, operational reliability, and regulatory compliance. However, two key challenges hinder ASD effectiveness: sparse GPS trajectories, which obscure short or unauthorized stops, and limited labeled data, which restricts supervised learning. Existing methods often assume dense sampling or regular movement patterns, limiting their applicability. To address data sparsity, we propose a Sparsity-Aware Segmentation (SAS) method that adaptively defines segment boundaries based on local spatial-temporal density. Building upon these segments, we introduce three domain-specific indicators to capture abnormal stop behaviors. To further mitigate the impact of sparsity, we develop Locally Temporal-Indicator Guided Adjustment (LTIGA), which smooths these indicators via local similarity graphs. To overcome label scarcity, we construct a spatial-temporal graph where each segment is a node with LTIGA-refined features. We apply label propagation to expand weak supervision across the graph, followed by a GCN to learn relational patterns. A final self-training module incorporates high-confidence pseudo-labels to iteratively improve predictions. Experiments on real-world coach data show an AUC of 0.854 and AP of 0.866 using only 10 labeled instances, outperforming prior methods. The code and dataset are publicly available at \href{https://github.com/pangjunbiao/Abnormal-Stop-Detection-SSL.git}
- Research Report (0.64)
- Instructional Material (0.54)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Bermuda (0.04)
- (3 more...)
Blend the Separated: Mixture of Synergistic Experts for Data-Scarcity Drug-Target Interaction Prediction
Zhai, Xinlong, Wang, Chunchen, Wang, Ruijia, Kang, Jiazheng, Li, Shujie, Chen, Boyu, Ma, Tengfei, Zhou, Zikai, Yang, Cheng, Shi, Chuan
Drug-target interaction prediction (DTI) is essential in various applications including drug discovery and clinical application. There are two perspectives of input data widely used in DTI prediction: Intrinsic data represents how drugs or targets are constructed, and extrinsic data represents how drugs or targets are related to other biological entities. However, any of the two perspectives of input data can be scarce for some drugs or targets, especially for those unpopular or newly discovered. Furthermore, ground-truth labels for specific interaction types can also be scarce. Therefore, we propose the first method to tackle DTI prediction under input data and/or label scarcity. To make our model functional when only one perspective of input data is available, we design two separate experts to process intrinsic and extrinsic data respectively and fuse them adaptively according to different samples. Furthermore, to make the two perspectives complement each other and remedy label scarcity, two experts synergize with each other in a mutually supervised way to exploit the enormous unlabeled data. Extensive experiments on 3 real-world datasets under different extents of input data scarcity and/or label scarcity demonstrate our model outperforms states of the art significantly and steadily, with a maximum improvement of 53.53%. We also test our model without any data scarcity and it still outperforms current methods.
GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning
Peng, Yubo, Jiang, Feibo, Dong, Li, Wang, Kezhi, Yang, Kun
Federated learning (FL) is a commonly distributed algorithm for mobile users (MUs) training artificial intelligence (AI) models, however, several challenges arise when applying FL to real-world scenarios, such as label scarcity, non-IID data, and unexplainability. As a result, we propose an explainable personalized FL framework, called XPFL. First, we introduce a generative AI (GAI) assisted personalized federated semi-supervised learning, called GFed. Particularly, in local training, we utilize a GAI model to learn from large unlabeled data and apply knowledge distillation-based semi-supervised learning to train the local FL model using the knowledge acquired from the GAI model. In global aggregation, we obtain the new local FL model by fusing the local and global FL models in specific proportions, allowing each local model to incorporate knowledge from others while preserving its personalized characteristics. Second, we propose an explainable AI mechanism for FL, named XFed. Specifically, in local training, we apply a decision tree to match the input and output of the local FL model. In global aggregation, we utilize t-distributed stochastic neighbor embedding (t-SNE) to visualize the local models before and after aggregation. Finally, simulation results validate the effectiveness of the proposed XPFL framework.
- Europe > United Kingdom > England > Greater London > London (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Hunan Province > Changsha (0.04)
- Europe > United Kingdom > England > Essex (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Education (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Song Emotion Classification of Lyrics with Out-of-Domain Data under Label Scarcity
Sakunkoo, Jonathan, Sakunkoo, Annabella
Songs have been found to profoundly impact human emotions, with lyrics having significant power to stimulate emotional changes in the audience. There is a scarcity of large, high quality in-domain datasets for lyrics-based song emotion classification (Edmonds and Sedoc, 2021; Zhou, 2022). It has been noted that in-domain training datasets are often difficult to acquire (Zhang and Miao, 2023) and that label acquisition is often limited by cost, time, and other factors (Azad et al., 2018). We examine the novel usage of a large out-of-domain dataset as a creative solution to the challenge of training data scarcity in the emotional classification of song lyrics. We find that CNN models trained on a large Reddit comments dataset achieve satisfactory performance and generalizability to lyrical emotion classification, thus giving insights into and a promising possibility in leveraging large, publicly available out-of-domain datasets for domains whose in-domain data are lacking or costly to acquire.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts (0.04)
- (2 more...)
- Media (0.74)
- Health & Medicine (0.48)
Rank and Align: Towards Effective Source-free Graph Domain Adaptation
Luo, Junyu, Xiao, Zhiping, Wang, Yifan, Luo, Xiao, Yuan, Jingyang, Ju, Wei, Liu, Langechuan, Zhang, Ming
Graph neural networks (GNNs) have achieved impressive performance in graph domain adaptation. However, extensive source graphs could be unavailable in real-world scenarios due to privacy and storage concerns. To this end, we investigate an underexplored yet practical problem of source-free graph domain adaptation, which transfers knowledge from source models instead of source graphs to a target domain. To solve this problem, we introduce a novel GNN-based approach called Rank and Align (RNA), which ranks graph similarities with spectral seriation for robust semantics learning, and aligns inharmonic graphs with harmonic graphs which close to the source domain for subgraph extraction. In particular, to overcome label scarcity, we employ the spectral seriation algorithm to infer the robust pairwise rankings, which can guide semantic learning using a similarity learning objective. To depict distribution shifts, we utilize spectral clustering and the silhouette coefficient to detect harmonic graphs, which the source model can easily classify. To reduce potential domain discrepancy, we extract domain-invariant subgraphs from inharmonic graphs by an adversarial edge sampling process, which guides the invariant learning of GNNs. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed RNA.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Beijing > Beijing (0.04)
Semi-supervised Domain Adaptation in Graph Transfer Learning
Qiao, Ziyue, Luo, Xiao, Xiao, Meng, Dong, Hao, Zhou, Yuanchun, Xiong, Hui
As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding, thus the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.
GraphFC: Customs Fraud Detection with Label Scarcity
Singh, Karandeep, Tsai, Yu-Che, Li, Cheng-Te, Cha, Meeyoung, Lin, Shou-De
Custom officials across the world encounter huge volumes of transactions. With increased connectivity and globalization, the customs transactions continue to grow every year. Associated with customs transactions is the customs fraud - the intentional manipulation of goods declarations to avoid the taxes and duties. With limited manpower, the custom offices can only undertake manual inspection of a limited number of declarations. This necessitates the need for automating the customs fraud detection by machine learning (ML) techniques. Due the limited manual inspection for labeling the new-incoming declarations, the ML approach should have robust performance subject to the scarcity of labeled data. However, current approaches for customs fraud detection are not well suited and designed for this real-world setting. In this work, we propose $\textbf{GraphFC}$ ($\textbf{Graph}$ neural networks for $\textbf{C}$ustoms $\textbf{F}$raud), a model-agnostic, domain-specific, semi-supervised graph neural network based customs fraud detection algorithm that has strong semi-supervised and inductive capabilities. With upto 252% relative increase in recall over the present state-of-the-art, extensive experimentation on real customs data from customs administrations of three different countries demonstrate that GraphFC consistently outperforms various baselines and the present state-of-art by a large margin.
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > Indonesia (0.04)
- (8 more...)